-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Rust: Handle Deref trait in type inference and data flow
#20987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /** | ||
| * Index assignments like `a[i] = rhs` are treated as `*a.index_mut(i) = rhs`, | ||
| * so they should in principle be handled by `referenceAssignment`. | ||
| * | ||
| * However, this would require support for [generalized reverse flow][1], which | ||
| * is not yet implemented, so instead we simulate reverse flow where it would | ||
| * have applied via the model for `<_ as core::ops::index::IndexMut>::index_mut`. | ||
| * | ||
| * The same is the case for compound assignments like `a[i] += rhs`, which are | ||
| * treated as `(*a.index_mut(i)).add_assign(rhs)`. | ||
| * | ||
| * [1]: https://github.com/github/codeql/pull/18109 | ||
| */ |
Check warning
Code scanning / CodeQL
Predicate QLDoc style Warning
baba061 to
5dda9be
Compare
eb1db27 to
e269016
Compare
| pragma[nomagic] | ||
| Type getACandidateReceiverTypeAtSubstituteLookupTraits( | ||
| string derefChain, boolean borrow, TypePath path | ||
| Type getANonPseudoCandidateReceiverTypeAt( |
Check warning
Code scanning / CodeQL
Missing QLDoc for parameter Warning
8266451 to
14037e4
Compare
14037e4 to
0c7b1d0
Compare
7880183 to
be3a16a
Compare
cc413c1 to
adcbfc8
Compare
dd62164 to
dc0c45b
Compare
f3bad27 to
c23d528
Compare
9190142 to
00b243a
Compare
452da63 to
2d8da40
Compare
2d8da40 to
a446415
Compare
a446415 to
dce21e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive support for Rust's Deref trait in both type inference and data flow analysis. The implementation enables proper resolution of method calls through implicit dereference chains and inserts appropriate data flow nodes for such operations.
Key changes:
- Introduces
DerefChainclass to track chains of implicit dereferences during type inference - Adds data flow support for implicit
derefcalls with synthetic nodes - Resolves numerous test expectations that were previously marked as
MISSING
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
shared/util/codeql/util/UnboundList.qll |
Makes getElement public to support DerefChain implementation |
shared/typeinference/codeql/typeinference/internal/TypeInference.qll |
Adds performance optimizations with new predicates and pragma annotations |
rust/ql/lib/codeql/rust/internal/typeinference/DerefChain.qll |
New file implementing deref chain logic using UnboundList |
| Test expectation files | Updates reflecting improved type inference and data flow (resolving MISSING cases) |
| Test source files | Updates comments to reflect resolved test cases (removing MISSING markers) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paldepind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great!
| /** | ||
| * Holds if the return type at `path` of the `deref` function, stripped of the | ||
| * leading `&`, mentions type parameter `tp` at `path`. | ||
| */ | ||
| pragma[nomagic] | ||
| predicate returnTypeStrippedMentionsTypeParameterAt(TypeParameter tp, TypePath path) { | ||
| exists(TypePath path0 | | ||
| tp = getReturnTypeMention(this.getDerefFunction()).resolveTypeAt(path0) and | ||
| path0.isCons(getRefTypeParameter(false), path) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the return type of the deref function stripped of the leading & just the Target type?
| /** | |
| * Holds if the return type at `path` of the `deref` function, stripped of the | |
| * leading `&`, mentions type parameter `tp` at `path`. | |
| */ | |
| pragma[nomagic] | |
| predicate returnTypeStrippedMentionsTypeParameterAt(TypeParameter tp, TypePath path) { | |
| exists(TypePath path0 | | |
| tp = getReturnTypeMention(this.getDerefFunction()).resolveTypeAt(path0) and | |
| path0.isCons(getRefTypeParameter(false), path) | |
| ) | |
| } | |
| /** | |
| * Holds if the target type of the dereference implemention mentions type | |
| * parameter `tp` at `path`. | |
| */ | |
| pragma[nomagic] | |
| predicate targetTypeParameterAt(TypeParameter tp, TypePath path) { | |
| tp = this.getAssocItem("Target").(TypeAlias).getTypeRepr().(TypeMention).resolveTypeAt(path) | |
| } |
Note, when quick eval'ing this definition on rust I see a few additional tuples, but they all look like correct results that we somehow didn't get before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is simpler.
| private SelfParam getSelfParam() { result = this.getDerefFunction().getSelfParam() } | ||
|
|
||
| /** | ||
| * Resolves the type at `path` of the `self` parameter inside the `deref` function, | ||
| * stripped of the leading `&`. | ||
| */ | ||
| pragma[nomagic] | ||
| Type resolveSelfParamTypeStrippedAt(TypePath path) { | ||
| exists(TypePath path0 | | ||
| result = getSelfParamTypeMention(this.getSelfParam()).resolveTypeAt(path0) and | ||
| path0.isCons(getRefTypeParameter(false), path) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the type at the self parameter of the deref function stripped of the leading & just the same as the implementing type of the impl block?
| private SelfParam getSelfParam() { result = this.getDerefFunction().getSelfParam() } | |
| /** | |
| * Resolves the type at `path` of the `self` parameter inside the `deref` function, | |
| * stripped of the leading `&`. | |
| */ | |
| pragma[nomagic] | |
| Type resolveSelfParamTypeStrippedAt(TypePath path) { | |
| exists(TypePath path0 | | |
| result = getSelfParamTypeMention(this.getSelfParam()).resolveTypeAt(path0) and | |
| path0.isCons(getRefTypeParameter(false), path) | |
| ) | |
| } | |
| /** Gets the type of the impleminting type at `path`. */ | |
| Type resolveSelfTypeAt(TypePath path) { result = resolveImplSelfTypeAt(this, path) } | |
| } | ||
|
|
||
| pragma[nomagic] | ||
| private predicate satisfiesConstraintTypeMention1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not inline satisfiesConstraintTypeMention1Inline in satisfiesConstraintTypeMention1Through and then define satisfiesConstraintTypeMention1 in terms of satisfiesConstraintTypeMention1Through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually did that originally, but in results in bad performance, since satisfiesConstraintTypeMention1 will then have very high tuple duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
| } | ||
|
|
||
| pragma[nomagic] | ||
| private predicate satisfiesConstraintTypeMention1Through( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of the 1 here? There is no variant without a 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get rid of the 1.
| TStructFieldContent(StructField field) or | ||
| TTupleFieldContent(TupleField field) { | ||
| Stages::DataFlowStage::ref() and | ||
| exists(tupleFieldApprox(field)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this not always hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not for fields belonging to union types.
| exists(string regexp | | ||
| regexp = "^(.*);(.*)$" and | ||
| derefChain = derefChainBorrow.regexpCapture(regexp, 1) and | ||
| borrow.toString() = derefChainBorrow.regexpCapture(regexp, 2) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also works. But maybe it is somehow slower? It feels like something the evaluator could implement with similar efficiency though.
| exists(string regexp | | |
| regexp = "^(.*);(.*)$" and | |
| derefChain = derefChainBorrow.regexpCapture(regexp, 1) and | |
| borrow.toString() = derefChainBorrow.regexpCapture(regexp, 2) | |
| ) | |
| derefChainBorrow = derefChain + ";" + borrow.toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that will be slower.
| * as long as the method cannot be resolved in an earlier candidate type, and possibly | ||
| * applying a borrow at the end. | ||
| * | ||
| * The string `derefChain` encodes the sequence of dereferences, and `borrows` indicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * The parameter `derefChain` encodes the sequence of dereferences, and `borrows` indicates |
| MkMethodCallDerefCand(MethodCall mc, DerefChain derefChain) { | ||
| mc.supportsAutoDerefAndBorrow() and | ||
| mc.hasNoCompatibleTargetMutBorrow(derefChain) and | ||
| exists(mc.getACandidateReceiverTypeAtNoBorrow(derefChain, _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we go further and demand a type at the root?
| exists(mc.getACandidateReceiverTypeAtNoBorrow(derefChain, _)) | |
| exists(mc.getACandidateReceiverTypeAtNoBorrow(derefChain, TypePath::nil())) |
| private Type getACandidateReceiverTypeAtNoBorrow(string derefChain, TypePath path) { | ||
| Type getACandidateReceiverTypeAtNoBorrow(DerefChain derefChain, TypePath path) { | ||
| result = this.getReceiverTypeAt(path) and | ||
| derefChain = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To respect the abstraction represented by the DerefChain type.
| derefChain = "" | |
| derefChain.isEmpty() |
I can see a few other instances of this by searching for derefChain = "".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those are leftovers.
| } | ||
|
|
||
| pragma[nomagic] | ||
| private Type inferMethodCallTypeSelf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A QLDoc for this predicate would be sweet!
paldepind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
This PR adds general support for the
Dereftrait when resolving method calls, which means both supporting it when actually resolving method calls in the type inference library, but also inserting the implicitderefcalls in data flow.As usual, commit-by-commit review is encouraged.
Type inference
When resolving method calls, a set of candidate receiver types are constructed by repeatedly dereferencing the receiver using applicable
Derefimplementations. Before this PR, we had limited support, namely&(mut) T -> TandString -> str. In order to handle arbitrary chains of dereferences, we introduce a new classDerefChain, based on the sharedUnboundListlibrary, which records the chain ofderefcalls needed to resolve a method call.After having resolved a method call, type information may also have to flow backwards through the chain of dereferences. Example:
Support for this is implemented in the
inferMethodCallTypeSelfpredicate, where theDerefChainis applied in reverse order, peeling off the top element until the chain becomes empty.Data flow
A method call
x.m()with an implicit dereference desugars to(*Deref::deref(&x)).m(), so we need to add data flow nodes for&x,Deref::deref(&x), and*Deref::deref(&x), as well as the implicit call toDeref::deref. This means we will have a reference store-step fromxto&xand a reference read-step fromDeref::deref(&x)to*Deref::deref(&x).The three different data flow nodes are represented by a state called
ImplicitDerefNodeState, and since we need to support arbitrary dereference chains, each synthetic node is additionally tagged with aDerefChainas well as an index into that chain.A small, but important, performance improvement is made when the targeted
derefmethod is one of the two built-in implementations; in this case, we can add a local flow step directly fromxtoDeref::deref(&x), which avoids the need for inter-procedural flow.Evaluation
The changes on this PR resolve a lot of
MISSINGtest expectations. DCA looks really great:Percentage of calls with call targetincreases by 2 % point, and as a consequence, we gain quite a lot of new results, all without regressing on performance. I also conducted a very positive QA experiment, which confirms the increase in alerts, e.g. 10 % morecleartext-loggingresults, 25 % morelog-injectionresults, and a staggering 100 % morepath-injectionresults.